Statistical Models for Presence-Only Data: Finite-Sample Equivalence and Addressing Observer Bias

نویسندگان

  • William Fithian
  • Trevor Hastie
چکیده

Statistical modeling of presence-only data has attracted much recent attention in the ecological literature, leading to a proliferation of methods, including the inhomogeneous poisson process (IPP) model [15], maximum entropy (Maxent) modeling of species distributions [12] [9] [10], and logistic regression models. Several recent articles have shown the close relationships between these methods [1] [15]. We explain why the IPP intensity function is a more natural object of inference in presence-only studies than occurrence probability (which is only defined with reference to quadrat size), and why presence-only data only allows estimation of relative, and not absolute intensities. All three of the above techniques amount to parametric density estimation under the same exponential family model. We show that the IPP and Maxent models give the exact same estimate for this density, but logistic regression in general produces a different estimate in finite samples. When the model is misspecified, logistic regression and the IPP may have substantially different asymptotic limits with large data sets. We propose “infinitely weighted logistic regression,” which is exactly equivalent to the IPP in finite samples. Consequently, many already-implemented methods extending logistic regression can also extend the Maxent and IPP models in directly analogous ways using this technique. Finally, we address the issue of observer bias, modeling the presenceonly data set as a thinned IPP. We discuss when the observer bias problem can solved by regression adjustment, and additionally propose a novel method for combining presence-only and presence-absence records from one or more species to account for it.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The point process use-availability or presence-only likelihood and comments on analysis.

1. Use-availability and presence-only analyses are synonyms. Both require two samples (one containing known locations, one containing potential locations), both estimate the same parameters, and both use the same fundamental likelihood. 2. Use-availability and presence-only designs compare characteristics of points where an organism was located to those where the organism could have been locate...

متن کامل

Bias properties of Bayesian statistics in finite mixture of negative binomial regression models in crash data analysis.

Factors that cause heterogeneity in crash data are often unknown to researchers and failure to accommodate such heterogeneity in statistical models can undermine the validity of empirical results. A recently proposed finite mixture for the negative binomial regression model has shown a potential advantage in addressing the unobserved heterogeneity as well as providing useful information about f...

متن کامل

Finite-Sample Equivalence in Statistical Models for Presence-Only Data.

Statistical modeling of presence-only data has attracted much recent attention in the ecological literature, leading to a proliferation of methods, including the inhomogeneous Poisson process (IPP) model, maximum entropy (Maxent) modeling of species distributions and logistic regression models. Several recent articles have shown the close relationships between these methods. We explain why the ...

متن کامل

Model-Based Control of Observer Bias for the Analysis of Presence-Only Data in Ecology

Presence-only data, where information is available concerning species presence but not species absence, are subject to bias due to observers being more likely to visit and record sightings at some locations than others (hereafter "observer bias"). In this paper, we describe and evaluate a model-based approach to accounting for observer bias directly--by modelling presence locations as a functio...

متن کامل

New Technical Efficiency Estimates with Improved Bootstrap Confidence Interval Coverage

Bootstrap confidence intervals on fixed-effects efficiency estimates from finite-sample panel data models exhibit low coverage probabilities, because the traditional estimate involves a "max" operator that induces a finite sample bias. Attempts to bootstrap confidence intervals for the traditional estimate have focused on correcting bias. Rather than addressing this bias at the bootstrap stage,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012